Efficient Scoring and Ranking of Explanation for Data Exchange Errors in Vagabond By

نویسندگان

  • ZHEN WANG
  • Francis Leung
چکیده

Data exchange has been widely used in big data era. One challenge for data exchange is to identify the true cause of data errors during the schema translation. The huge amount of data and schemas make it nearly impossible to find “the” correct solution. Vagabond system is developed to address this problem and use best-effort methods to rank data exchange error explanations base on the likelihood that they are the correct solutions. Ranking done on scoring functions that model some aspects of explanation sets. Examples of these properties include complexity(size of explanation), and side effect size(number of correct data values that will be affected by the changes). The thesis introduced three new scoring functions to increase the applicability of Vagabond under various data exchange scenarios. We prove that the monotonicity property required by Vagabond may not hold for some of the new scoring functions, so a new generic ranker is also introduced to efficiently rank error explanations for these new scoring functions as well as for future scoring functions that have boundary property. We can efficiently compute upper or lower bounds on the score of partial solutions. We also completed some performance experiments on the new scoring functions and the new ranker. The experiment result proves that the new scoring functions introduced in this thesis have a scalable performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Explanation of Residents' Experiences Concerning Medication Errors in Neonatal Intensive Care Units: A Qualitative Study

Introduction: Medication errors are a potentially hazardous accident for the patients and can be used as a measure of patient safety in the healthcare system. Neonates are the most vulnerable population because of their body size. The experiences and views of those involved in the healthcare system can be a significant source of information gathering and planning in preventing medication errors...

متن کامل

Combination of DEA and ANP-QUALIFLEX Methods to determine the most Efficient Portfolio (Case study: Tehran Stock Exchange)

The existence of an active and prosperous capital market is always recognized as one of the signs of international development in the countries. The most important issue faced by investors in these markets is the decision to choose the appropriate securities for investment and formation of optimal portfolio. The rating of companies accepted in stock exchange is a complete mirror of their status...

متن کامل

Debugging Data Exchange with Vagabond

In this paper, we present Vagabond, a system that uses a novel holistic approach to help users to understand and debug data exchange scenarios. Developing such a scenario is a complex and labor-intensive process where errors are often only revealed in the target instance produced as the result of this process. This makes it very hard to debug such scenarios, especially for non-power users. Vaga...

متن کامل

Ranking Efficient Decision Making Units in Data Envelopment Analysis based on Changing Reference Set

One of the drawbacks of Data Envelopment Analysis (DEA) is the problem of lack of discrimination among efficient Decision Making Units (DMUs). A method for removing this difficulty is called changing reference set proposed by Jahanshahloo and et.al (2007). The method has some drawbacks. In this paper a modified method and new method to overcome this problems are suggested. The main advantage of...

متن کامل

Evaluating and Ranking the Firms in Chemical Industry Listed in Tehran Stock Exchange with TOPSIS

Due to the sublimation and perfection of human knowledge in economics, the concept of efficiency developed in the past two decades and the measurement of it, based on different theories and practice. In economics, efficiency means the maximum of possible output from a certain amount of input. The efficiency is very important for developing countries Because these countries face to a shortage of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014